Image binarization is a common operation. For grayscale images, finding the best threshold for binarization can be a manual operation. Alternatively, algorithms can select a threshold value automatically; which is convenient for computer vision, or for batch-processing a series of images.
Otsu algorithm is the most famous thresholding algorithm. It maximizes the variance between the two segmented groups of pixels. Therefore, it is can be interpreted as a clustering algorithm. Samples are pixels and have a single feature, which is their grayscale value.
%matplotlib inline
import matplotlib
matplotlib.rcParams['image.interpolation'] = 'nearest'
import numpy as np
import matplotlib.pyplot as plt
from skimage import exposure, filters, io , color
ic = io.ImageCollection('FINAL_TRAINING_DATA_SET/*.jpg')
for i, image in enumerate(ic):
im = color.rgb2gray(image)
hi = exposure.histogram(im)
val = filters.threshold_otsu(im)
fig, axes = plt.subplots(1, 2)
axes[0].imshow(im, cmap='gray')
axes[0].contour(im, [val], colors='y')
axes[1].plot(hi[1], hi[0])
axes[1].axvline(val, ls='--')
k-means clustering uses the Euclidean distance in feature space to cluster samples. If we want to cluster together pixels of similar color, the RGB space is not well suited since it mixes together information about color and light intensity. Therefore, we first transform the RGB image into Lab colorspace, and only use the color channels (a and b) for clustering.
from sklearn.cluster import KMeans
ic_seg_images = io.ImageCollection('Segmenting_image_data_set/*.jpg')
for i, image in enumerate(ic_seg_images):
im_lab = color.rgb2lab(image)
data = np.array([im_lab[..., 1].ravel(), im_lab[..., 2].ravel()])
kmeans = KMeans(n_clusters=2, random_state=0).fit(data.T)
segmentation = kmeans.labels_.reshape(image.shape[:-1])
fig, axes = plt.subplots(1, 2)
axes[0].imshow(image)
axes[1].imshow(image)
axes[1].contour(segmentation, colors='y')
In the thresholding / vector quantization approach presented above, pixels are characterized only by their color features. However, in most images neighboring pixels correspond to the same object. Hence, information on spatial proximity between pixels can be used in addition to color information.
SLIC (Simple Linear Iterative Clustering) is a segmentation algorithm which clusters pixels in both space and color. Therefore, regions of space that are similar in color will end up in the same segment.
SLIC is a superpixel algorithm, which segments an image into patches (superpixels) of neighboring pixels with a similar color. SLIC also works in the Lab colorspace. The compactness parameter controls the relative importance of the distance in image- and color-space.
After the super-pixel segmentation (which is also called oversegmentation, because we end up with more segments that we want to), we can add a second clustering step to join superpixels belonging to the same region.
from skimage import segmentation
for i, image in enumerate(ic_seg_images):
segments = segmentation.slic(image, n_segments=200, compactness=20)
result = color.label2rgb(segments, image, kind='mean')
im_lab = color.rgb2lab(result)
data = np.array([im_lab[..., 1].ravel(),
im_lab[..., 2].ravel()])
kmeans = KMeans(n_clusters=5, random_state=0).fit(data.T)
labels = kmeans.labels_.reshape(image.shape[:-1])
color_mean = color.label2rgb(labels, image, kind='mean')
fig, axes = plt.subplots(1, 2)
axes[0].imshow(segmentation.mark_boundaries(image, segments))
axes[1].imshow(segmentation.mark_boundaries(image, labels))
def image_show(image, nrows=1, ncols=1, cmap='gray', **kwargs):
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(5, 5))
ax.imshow(image, cmap='gray')
ax.axis('off')
return fig, ax
for i, image in enumerate(ic_seg_images):
image_slic = segmentation.slic(image)
image_show(color.label2rgb(image_slic, image, kind='avg'));
from skimage import measure
measure.regionprops?